﻿ Summarizing documents based on cue-phrases and references caşu, Laurenţiu Ghetu Dan Cristea Oana Postolache, Georgiana Puş „Al I Cuza“ University of Iasi „Al I Cuza“ University of Iasi Faculty of Computer Science Faculty of Computer Science and 16, Berthelot St Romanian Academy 6600 – Iasi, Romania Institute of Theoretical Computer {oanap, georgie, laug}@infoiasi ro Science – the Iasi branch dcristea@infoiasi ro about the key-entity, within the context of the Abstract whole document A possible scenario addressing the need for a focused summary is that of a user The paper presents a method of building interested in reviewing scientific texts, in particular the discourse structure of a discourse by in findings on a certain drug Using Google or an- combining indications on local structure other search engine she gets a tremendous lists of given by cue-phrases with indications on documents mentioning the searched entity Since global structure given by references found time does not allow her to read all the found in the text Then the discourse structure is documents, abstracts would be of value The prob- used to obtain focused summaries lem with a general abstract that can be obtained by passing the task to a common abstracting engine is that the item searched could be secondary to the 1 Introduction theme of the document, in which case it will not be There is generally accepted that a strong correla-included in the generated abstract The user would tion exists between the structure of discourse and be interested to know, briefly, why is that entity referentiality (Fox, 1987; Vonk et al , 1992, mentioned in a document, therefore a focused ab- Cristea et al , 2000) On another hand, document stract summarization could take advantage from knowing At the base of our method stays the assumption the structure Marcu (2000), for instance, has that if summarization is the goal, a less precise dis- shown how parameterized summaries of a docu-course structure is sufficient To obtain it, cue- ment can be build provided its rhetorical structure words and phrases are good indicators of local is known Putting all together, one can arrive, structural interdependencies between elementary without any surprise, at the strong interdependence discourse units (edus); based on cue-phrases, ele- between referentiality and summaries, intermedi-mentary sentence-level trees (sdts) are inferred; ated by the discourse structure A simple argument they are then integrated into a global coherent dis- in support of this interdependence is that a sum-course tree using indications on discourse structure mary cannot be coherent if it contains dangling brought by references, as outlined by veins theory references (Cristea et al, 1998) In this paper we present a method to obtain co-The paper is structured as follows: section 2 herent focused summaries based on the discourse presents the method, sections 3 and 4 present the structure of the discourse, which is partially in-basics (veins theory and the resolution of anaph- ferred from indications on local structure given by ora), section 5 describes a set of consistency con- cue-phrases and partially from references found in strains for the discovery of sdts, sections 6 displays the text the methods of integration of the sdts into a whole The summaries that we obtain are extracts discourse tree based on references, section 7 pre- (Mani, 2001) We say that a summary of a docu-sents the data employed in the experiment, and ment is focused on a certain discourse entity if the some results, and section 8 discusses possible ex- summary reveals on short what the document tells tensions 2 The method tion 4) The chains of co-referential links are then used to sew pieces together in a complete discourse A text can be read in many ways Practically each structure (described in section 6) edu of the text gives a specific perspective from Having the discourse structure, the vein expres- which to interpret the whole text Such a perspec- sions of the edu containing the entity search for tive centred on a certain edu should be thought as will configure the output summary revealing what would be the meaning of that par- ticular edu in the overall context Cristea et al (1998) propose a theory to evidence centred inter-3 Veins theory pretations, cut up from a rhetorical-like discourse By using the RST notion of nuclearity, veins the- tree structure The vein expressions defined there ory (VT) (Cristea et al , 1998), (Ide&Cristea, constitute means to look at edus from inside out 2000) reveals a "hidden" structure in the discourse Each such vein expression, claiming to express tree, called vein, which enables to evidence for what the text says about that specific edu in the each unit of a discourse a domain of evocative ac- overall context, gives also a way of summarising cessibility (dea) as a string of units where antece- the text, focussed on the entities mentioned in that dents of anaphors belonging to the unit should be edu found Many discourse parsing methods have been de-The fundamental intuition underlying the uni- scribed (Cole et al , 1995; Marcu, 2000; Cristea, fied view on discourse structure and accessibility 2000) Cristea (2000), for instance, presents an in VT is that the RST-specific distinction between incremental discourse parsing method, which nuclei and satellites constrains the range of refer- places at the base the principle that a text has the ents to which anaphors can be resolved; in other one discourse structure, that displays the smoothest words, the nucleus-satellite distinction, superim- centering transitions (Grosz et al , 1995) along posed over a tree-like structure of discourse, in- veins, as well as most of the references satisfied duces for each anaphor a dea along the veins These two criteria combine to The observations that underline the computa- score a number of plausible partial trees and to re-tion of vein expressions in VT are as follows (dis- tain at each step of the incremental development N course units are noted here after u1, u2, u3, while best scored trees Unlike Cristea (2000), the relations R, R1, R2; when used as arguments of rela- method proposed here does not first build a tree in tions, the units’ nuclearity will be marked by a su- order to accept it, if well scored, or to filter it out, perscript n – for nucleus, and s – for satellite; we if badly scored, but rather uses references as a will say that "a unit u2 refers a unit u1" and we will guide during the development the tree understand "a referential expression belonging to a The other clue used in building the structure is unit u2 refers a discourse entity also referred from given by cue-phrases (Knott and Dale, 1992), the unit u1"): (Marcu, 2000) which are used to build sdts To do - a satellite or a nucleus can refer a nuclear sib- that, we use Soricutu and Marcu’s (2003) claim ling to its left: in sequences u1n R u2s, or u1n R that the text span corresponding to one sentence is u2n, u2 can refer u1; merely covered by one node in the structure (more - a nucleus can refer its own left satellite: in se- than 90% of the cases, according to them) quences u1s R u2n, u2 can refer u1; The preparatory phases suppose POS-tagging - a right satellite of a nucleus u cannot be ac- (Tufis, 1999), syntactic tagging done by an FDG cessed from another right sibling, nuclear or parser and NP-tagging (Ait-Mohtar, and Chanod, satellite, of u: in sequences (u1n R1 u2s)n R2 u3n 1997) Then edu are detected (Puscasu, forthcom-or (u1n R1 u2s)n R2 u3s, u3 can refer u1 but not u2; ing) based on the identification of finite verbs and - a nucleus blocks the accessibility from a right detection of their syntactic roles Local corrections, satellite to a left satellite: in sequences mainly due to cue-phrases, are also possible (u1s R1 u2n)n R2 u3s, u3 can refer u2 but not u1 Following, sdts are build (as will be described VT contributes with a view on top-down sum- in section 5) In parallel, or following the sdt-marization, similar to Marcu’s (2000), while also building phase, antecedents of anaphors are looked revealing how focused summaries can be pro- for, by running the AR-engine (described in sec-duced 4 Anaphora Resolution dents, bridging anaphora, as well as anaphoric ref- erences other than net co-references) In (Cristea and Dima 2001), (Cristea et al , 2002a) a framework incorporating a general anaphora resolution (AR) engine and able to accommodate 5 Consistency constraints for elementary different AR models is proposed This approach discourse trees sees the linguistic and semantic entities involved in In this section we propose a representation and the cognitive process of anaphora resolution repre-a method of determining safe inter-edus local de- sented on three layers: the text layer – populated pendencies, contributed by cue words or phrases with referential expressions (res), the projected (in the following, called markers) The dependen- layer – where feature structures are filled-in with cies will configure an elementary discourse tree information fetched from the text layer (in the fol-structure covering mainly a sentence (sometimes lowing, projected structures – pss) and the deep even more than that) in which inner nodes are la- semantic layer – where discourse entities (des), belled with markers and terminal nodes with edu actually a representation of the entities the dis-labels Each node of the tree is also marked by a course talks about, are placed It is said that a ps is nuclearity function in the set {n, s} (for nuclear, projected from an re and that a de is proposed or satellite) such that at each level, between the two evoked by a ps descendents of an inner node, at least one is Within the AR-engine framework an AR model marked n is defined in terms of four components: a set of Example 1: attributes and the corresponding types of the ob-[John is determined to pass the NLP exam1] so, be- jects populating the projection and semantic layers, cause [he has missed many courses 2] and [was only a set of knowledge sources (virtual processors) vaguely implicated at the working sessions 3] , [he will intended to fetch values from the text to the attrib-have a hard time until summer 4] utes of the ps, a set of matching rules and heuris-In this example, the notation indicates the seg- tics responsible to decide whether the ps mentation in edus (in square brackets) and the cue corresponding to an re introduces a new de or, if words with an impact in the determination of the not, which of the existing des it evokes, and a set discourse structure (underlined): so – indicates that of heuristics that configure the domain of refer-a unit subordinate to the preceding one follows; ential accessibility, establishing the order in which because (following another cue word, as well as in des have to be checked, or certain proximity re-a sentence-initial position) – indicates that the unit strictions it is prefixing is a subordinate of a unit that follows In (Cristea et al , 2002a), pronominal as well as after this one; and – indicates a conjunction be- noun anaphora were investigated To a great ex-tween two units of equal nuclearity that prefix and, tend, the results proved the initial hypothesis, respectively, succeeds it This arrives at associating namely that models behave better and better as to markers argument patterns, as suggested in more features are fired For a small corpus of about Figure 1: two pages taken from the novel “1984” of G Or- well (where five characters have been tracked, – so –because –, – – and – whose co-reference chains in the golden annotation had lengths of 23, 14, 3, 25 and 16 referential ex- pressions), the best models experienced proved Figure 1: Arguments patterns of cue-phrases 100% precision and a recall in the range 70% to Example 1 displays the following distribution 100% In another research (Cristea et al , 2002b) of edus and markers: 1 so, because 2 and 3, 4 the investigation was extended over cases gener-On each branch stemming out of a marker in ally considered difficult to tackle (co-reference Figure 1, virtually any domain of arguments, ob- resolution triggered by positional constrains, com-tained by a combination of the units laying in the mon nouns anaphor and antecedent with disagree-text on the corresponding part, could be formed In ment in lemma, noun and pronoun anaphors the following notation, we will mark these domains displaying number disagreement with the antece-as ordered lists of discourse units, while also un- derlying the nuclear arguments: so The “unique-root” rule: because , There is one and only one marker that covers and Among all possible combinations of lists of the sequence of all edus edus encumbered by this scheme, we will reject In the example above, the combination of lists = , M2= (for so), M3= , M4= (for from the beginning the empty lists, as well as those M1 = , M6= (for and) do not whose content concatenation display gaps in the because), and M5 sequence The following pairs of argument-lists obey this rule, because both so and and do cover remain: the whole range of edus – so – because –, – – and –The “one-parent” rule: = Mj with i ≠ j – – – There are no two lists Mi – – – This rule asserts the obvious condition in trees – – that is impossible to have one text span which is an – argument to two distinct markers = , Among the 3 * 2 * 4 = 24 possibilities, many For instance, the combination M1 = (for so), M3= , M4= (for be- are still inconsistent The following rules state fur-M2 = , M6= (for and) contradicts ther constrains (we will note the lists with M1, M2, cause), and M5 =M5 and etc ) They express natural conditions of tree well-twice this rule because of the lists M3 =M6 formedness: M4 The “nesting-arguments” rule: Among the 24 possible combinations of lists of Mj with i≠j, then either Mi ⊆ Mj or the above example, only one obeys all four rules: If x∈Mi ∩ M1= , M2= (for so), M3= , M4= (for Mj Mi ⊇ because), and M5= , M6= (for and), which is This rule states that it is impossible to have two also the expected one, displaying the sentence- inner nodes of the tree, which cover crossing text level tree of Figure 2 spans on the terminal frontier The combination M1= , M2= (for so), – so – M3= , M4= (for because), and M5= , M6= (for and) do not abbey this rule, because 1because –, – M5 and neither M2 ⊆ M5, nor M2 ⊇ M5 2∈M2, 2∈ Instead, the combination M1= , M2= – and – 4 (for so), M3= , M4= (for because), and M5= , M6= (for and) do abbey the nesting 32 arguments rule Figure 2: The elementary discourse tree of Ex- The “balanced-displacement” rule: ample 1 (nuclear nodes filled-in) For any two edus x, y placed in sequence (x be- fore y), at least a marker, denoted by m, exists such The above rules applied to sentence-initial that: x∈left subtree(m) and y∈right subtree(m) markers yields the integration of adjacent sen- This rule forbids the existence of dangling edus tences into larger elementary tree in an elementary discourse tree It stems from the assumption that a sufficient number of markers are 6 The processing model found in the text There where the text contributes with no marker, an empty cue-word ∅ is consid-This section describes how pieces can be sewed ered instead, with the default argument pattern: – together The idea is to integrate elementary trees into a global one by taking into consideration ref- ∅ – Any of the combinations of nuclearity labels erences discovered by the AR-engine and arrang- (n, n), (n, s), (s, n) are possible for its arguments ing the edus the references belong to such that they In the example above, the combination of lists lay mostly along unit’s vein expression M1= , M2= (for so), M3= , M4= (for Processing follows the following three phases: because), and M5= , M6= (for and) do not 1 determination of co-referential chains; obey this rule, because for edus 3 and 4 there is no 2 building sentence level discourse trees (sdts) marker with 3 in its left sub-tree and 4 in its right based on intra-sentence markers; sub-tree 3 integration of sdts up to a global discourse and what should be the nuclearity pattern of the tree two descendents of the relation node in the auxil- During the first phase, AR-engine is run over iary tree (the foot node and sdti+1)? the POS-tagged, FDG-analysed, and NP-tagged text, as explained in section 4 The result is a set of pdti-1pdti co-referential chains of res All res belonging to each such chain point to a unique de sdti During the second phase, the syntactic con-* strains encumbered by cue-phrases at (mainly) sen- tence level are applied, in order to arrive to a foot node sequence of sdts, as explained in section 5 Then, during the third phase the sentence level trees are Figure 3: The adjoining operation combined in sequence with the aim to obtain one complete tree of the whole discourse We define a function that records the number of Let’s note that the model we describe is opened co-references between edus belonging to different for both an incremental as well as a pipe-line type sdts, as follows D of processing In incremental processing, suppose a If SDT is the ordered set of all sdts over dis- partial discourse tree (pdt) is obtained from proc-course D (indexed fromD left to right, as the text essing the text up to (and excluding) the current unfolds), andDDDD U is the set of all edus in D, then: N edu The current sentence s is submitted to the first f: SDT x P(U) x SDT x P(U)Æ phase, as described above, resulting in a set of co-(where, if x is a set, P(x) is the power set of x, and reference relations from all anaphors contained in N is the set of natural numbers), defined as fol- is an edu on the terminal frontier of sdt s Then, as a result of running the second phase, an lows: if uk , and ul is an edu on the terminal frontier of sdt tj sdt t, is obtained from s Finally, the third phase ti will integrate t into pdt, leaving a larger pdt In a (* represents all edus on the terminal frontier of , respectively SDT tj ) then: pipe-line type of processing, the first phase is run SDT ti , uk, tj, ul) = number of antecedents belonging over the whole document, leaving a set co-f(ti of ti directly or indirectly referred by referential expressions In parallel with this phase to unit uk of tj; or following it, the second phase will be run over anaphors belonging to unit ul , *, tj, ul) = number of antecedents belonging the whole text, leaving a sequence of sdts Finally, f(ti directly or indirectly referred by the third phase will be run, in order to integrate the to all units of ti of tj (if two or more sequence of sdts into a global discourse tree by anaphors belonging to unit ul refer the same antecedent belonging taking into consideration the set of co-references anaphors in ul then f will count all of them); The following discussion applies to both proc-to ti , uk, tj, *) = number of antecedents belonging essing models We will suppose the parser is in a f(ti of ti directly or indirectly referred by state when the first i sdts have been combined into to the unit uk ; a partial discourse tree structure, pdti, and the next anaphors belonging to all units of tj , *, tj, *) = number of antecedents belonging sdt under operation is sdti+1 This tree has to be f(ti directly or indirectly referred by combined with the developing tree pdti by adjoin-to the all units of ti ; ing an auxiliary tree obtained from this one on the anaphors belonging to all units of tj , uk, tj, head(root(tj))) = number of antece- right frontier of the developing tree f(ti of ti directly or indi- (Cristea&Webber, 1997) An auxiliary tree of an dents belonging to the unit uk sdt t consists of a relation node, with a dummy rectly referred by anaphors belonging to those that are contained in the head expression (foot) node as its left child and t as its right child units of tj Figure 3 displays the adjoining operation If the of the root of tj name of the relation rooting the auxiliary tree is The following rules give decision criteria with ignored, still two other problems are to be dealt respect to the node of the right frontier of the de- where adjoining is to be oper- with: to what node of the right frontier of the de-veloping tree pdti-1 veloping tree should the adjunction be directed, ated: t2 for instance t1 u2 t4 u6u8 t3 u1 because and u4 u5u7u9when: u3 a b c b b d b d bd abcd b a a Figure 4: Sdts and references for Example 2 (a = Maria, b = Simon, c = the child, d = I, empty = any other REs) Rule 1: p≠q f(tq, ul, ti, *), where 1 ≤ (meaning: the elementary tree ti comes in sequence If f(tp, head(root(uk)), ti,*) > 0 then ti will be after the elementary trees tp and tq), uk is any unit nuclear, otherwise it will be satellite of tp and ul is any unit of tq, then if nk is the right We will display how the model works on the frontier node of pdti-1 (that contains both tp and tq) following example: covering uk or the lowest node on the right frontier Example 2 of tp that contains uk in its head expression, then [Maria went alone to the market 1] because [Simon the auxiliary tree stemmed out of ti is adjoined onto had to stay at home with the baby 2] [Simon is a good the node nk friend of mine 3] and [he also helped me in a number of 4 5 Rule 2: situations ] For instance [he was very 6helpful] when p≠q q; instance 9and when 98for 38 Rule 5: If there is no p 0, with uk any unit of tp, or if such a a 6b p exists but none of its uk are visible on 5 the right frontier of pdti-1, then ti is adjoined onto Figure 5: Correct (a) and computed (b) tree the lowest most node of the right frontier of pdti-1 Figure 4 shows the sequence of sdts and the co- as a satellite of it reference chains after the first two phases (in a The root of the auxiliary tree being adjoined pipe-line processing) The four sdts are grouped always remains with the nuclearity of the node together by three adjoining operations At each where the adjoining is being made What still has step i, the function f(tp, uk, ti+1,*) is computed for to be decided is the nuclearity of the foot node each p < i, and any uk belonging to tp, then the rules (which will give the nuclearity of the node onto described above are applied Only the first step is which the adjunction is being made, let’s call it uk) detailed here: the sdt t2 has to be adjoined onto the and of its sibling (the current sdt ti): first pdt, which is t1: f(t1, u1, t2,*) = 0; f(t1, u2, t2,*) Rule 6: = 3; (corresponding to the 3 references for Simon The node uk, where the adjoining is being made in u2, from u3 and u4) Applying rule 1 t2 must be will always be nuclear adjoined onto t1 at the level of u2 According to rule 6, u2 will be nuclear, while t2 will be satellite, according to rule 7 (the root of u2 is the relation ond, sdts were generated based on our approach node because, whose head expression is u1, and We used a two pages excerpt from the original f(t1, u1, t2,*) = 0) Figure 5 shows the correct tree, English version of G Orwell’s “1984” which con- drawn by hand, compared with the computed one tained 45 sentences out of which 19 were one- The table below displays the vein expressions of clause sentences, our attention being focused on the correct tree compared with the computed one the remaining 26 complex sentences In the first experiment we used the FDG output both for ex- golden computed tracting the units (the clauses) and for building the 1 1 1 tree It turned out that only 7 sentences (27%) 2 1 2 1 2 7 9 could be resolved correctly while another 6 were 3 1 2 3 4 1 2 3 4 7 9 only partially correct In the second experiment for 4 1 2 3 4 1 2 3 4 7 9 20 sentences (76%) the method correctly indicated 5 1 2 3 4 5 1 2 3 4 5 7 9 a unique sdt, while for the remaining 6 sentences 6 1 2 3 4 5 6 1 2 3 4 5 6 7 9 7 1 2 7 1 2 7 9 more than one tree could be generated 8 1 2 7 8 9 1 2 7 8 9 To validate the focused summarisation 9 1 2 7 8 9 1 2 7 8 9 method guided by veins, a one-page text from “The Legends of Mount Olympus” of Al Mitru, Let’s try focused summaries for Maria, and the consisting of 62 edus was used 57 students, par- child Maria is referred in edus 1, 7, 8, and 9 (see ticipants of the EUROLAN’01 summer school, example 2 and figure 4) The longest vein expres-were asked to extract a summary of the text, fo- sion of these edus (1 2 7 8 9) is the same in both cused on Hefaistos, a secondary character in the golden and computed tree Therefore, the summary extract We then built a golden summary composed focused on Maria will be: of units voted by more than a half of judges - 28 Maria went alone to the market because Simon had to out of 57 For each judge, the recall and precision stay at home with the baby I think she has a lot of trust values were calculated In the following table, the in him to let him alone with the child You know how average of these values and the VT results are pre- Maria is: she is not very hurried to give credit to any-sented: body Judges’ results VT’s results The child is referred in edus 2 and 7 The long-Precision 74 26% 73 33% est vein expression of these units is 1 2 7 in the Recall 72 92% 64 71% golden tree and 1 2 7 9, in the computed tree The summary focused on the child will be: 8 Discussions and further work Maria went alone to the market because Simon had to stay at home with the baby I think she has a lot of trust We are aware that errors can intervene in all proc- in him to let him alone with the child She is not very essing steps of the described summarisation hurried to give credit to anybody method (segmentation in edus, detection of sdts, anaphoric links detection) Further investigation 7 Data and experiments will have to identify the overall trust in the method The assumption on the correlation of vein struc-proposed ture with co-references was based on earlier ex-An earlier investigation (Ide and Cristea, 2000) periments reported in (Cristea et al 1998) An showed a correlation between the type of the ana- average, the results of experiments on Romanian phor (pronominal, proper nouns, definite or indefi- and English texts revealed that in 99 1% references nite noun) and the percentage on which the obey this conjecture antecedent is found along veins of the discourse Around 50 manually discovered cue-phrase pat-structure This suggests that the method of building terns were used in the sentence-level tree construc-the global structure of the discourse guided by ref- tion, described in section 5 In order to validate the erences could be further sophisticated by using approach we developed two experiments In the scores to account for type of antecedents first experiment, sdts were built based on the in-The described method of inferring the discourse formation given by an FDG parser, and in the sec-structure is deterministic in the sense that only one tree is obtained Further development would have to transform it into a beam-search type of process-Cristea, D , Dima, G E , Postolache, O -D and ing, close to the one described in (Cristea, 2000), Mitkov, R 2002b Handling complex anaphora in order to combine contribution from cue-phrases, resolution cases, Proceedings of the Discourse and references with that given by centering This Anaphora and Anaphor Resolution Colloquium, way, the problem itself of partial trees proliferation Lisbon caused by cue-phrases with multiple patterns, pres-Cristea,D (2000): An Incremental Discourse ently ignored, could also be tackled Parser Architecture, D Christodoulakis (Ed ) As demonstrated by the example in the previous Proceedings of the Second International Confer- section, the computed vein expressions have a ten-ence - Natural Language Processing - NLP dency to be larger than needed, this yielding to 2000, Patras, Greece, June 2000 Lecture Notes longer summaries More sophisticated integration in Artificial Intelligence 1835, Springer rules, automatically discovered from a discourse Fox, B 1987 Discourse Structure and Anaphora, structure annotated corpus by learning, could fix Cambridge University Press this problem Grosz, Barbara J , Aravind K Joshi, Scott Finally, it is to note that the structure of a dis-Weinstein 1995 Centering: a Framework for course as a complete tree gives more information Modelling the Local Coherence of Discourse than properly needed (at least for summarization Computational Linguistics, 21(2) purposes) An underspecified type of representa-Ide,N , Cristea,D (2000): A Hierarchical Account tion, keeping, for instance, only vein expressions of Referential Accessibility Proceedings of The not the whole tree, could be a better solution 38th Annual Meeting of the Association for Computational Linguistics, ACL'2000, Hong References Kong Ait-Mohtar, S and Chanod, J -P 1997 Incre-Knott, A and Dale, R 1992 Using Linguistic mental Finite-State Parsing Proceedings of Phenomena to Motivate a Set of Coherence Re- ANLP'97, Washington lations Discourse Processes 18(1) Cole, R A , Mariani, J , Uszkoreit, H, Zaenen, A Mani, I 2001 Automatic Summarization Natural and Zue, V 1995 Survey of the State of the Art Language Processing series John Benjamins in Human Language Technology Publishing Co , Amsterdam Cristea, D and Webber B L (1997) Expectations Marcu, D 2000 The Theory and Practice of Dis- in Incremental Discourse Processing Proceed-course Parsing and Summarization The MIT ings ofACL/EACL'97, Madrid Press Cristea, D , Ide, N and Romary, L (1998) Veins Puscasu, G forthcoming Elementary discourse Theory: A Model of Global Discourse Cohesion unit segmentation Dissertation thesis and Coherence, Proceedings of Coling/ACL'98, “Al I Cuza” University of Iasi Montreal Soricutu and Marcu (2003) Sentence Level Dis- Cristea, D , Ide, N , Marcu, D and Tablan, V course Paring using Syntactic and Lexical In- 2000 Discourse Structure and Co-Reference: An formation, Proceedings of HLT/NAACL – 2003, Empirical Study Proceedings of The 18th Edmonton , Dan 1999 Tiered Tagging and Combined International Conference on Computational Tufiş Linguistics COLING'2000, Luxembourg Classifiers F Jelinek, E Nöth (Eds) Text, Cristea, D and Dima, G -E 2001 An Integrating Speech and Dialogue, Lecture Notes in Artificial Framework for Anaphora Resolution Informa-Intelligence, 1692, Springer tion Science and Technology, Romanian Acad-Vonk, W , Hustinx, L and Simons, W 1992 The emy Publishing House, Bucharest, 4(3) Use of Referential Expressions in Structuring Cristea, D , Postolache, O -D , Dima, G -E and Discourse Language and Cognitive Processing Barbu, C 2002a AR-Engine – a framework for 7 (3-4) unrestricted co-reference resolution Proceedings of Language Resources and Evaluation Confer- ence - LREC 2002, Las Palmas, vol VI: 2000- 2007 